129 research outputs found

    Comparing similar ordered trees in linear-time

    Get PDF
    AbstractWe describe a linear-time algorithm for comparing two similar ordered rooted trees with node labels. The method for comparing trees is the usual tree edit distance. We show that an optimal mapping that uses at most k insertions or deletions can then be constructed in O(nk3) where n is the size of the trees. The approach is inspired by the Zhang–Shasha algorithm for tree edit distance in combination with an adequate pruning of the search space based on the tree edit graph

    Predicting transcription factor binding sites using local over-representation and comparative genomics

    Get PDF
    BACKGROUND: Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. RESULTS: We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. CONCLUSION: TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at

    Decomposition algorithms for the tree edit distance problem

    Get PDF
    AbstractWe study the behavior of dynamic programming methods for the tree edit distance problem, such as [P. Klein, Computing the edit-distance between unrooted ordered trees, in: Proceedings of 6th European Symposium on Algorithms, 1998, p. 91–102; K. Zhang, D. Shasha, SIAM J. Comput. 18 (6) (1989) 1245–1262]. We show that those two algorithms may be described as decomposition strategies. We introduce the general framework of cover strategies, and we provide an exact characterization of the complexity of cover strategies. This analysis allows us to define a new tree edit distance algorithm, that is optimal for cover strategies

    Lossless seeds for searching short patterns with high error rates

    Get PDF
    International audienceWe address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P , find alllocations in T that differ by at most k errors from P . For that purpose, we propose a filtration algorithm that is based on a novel type of seeds,combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically well-suited for short patterns with a large number of error

    Algebraic Dynamic Programming 2.0

    Get PDF
    International audienceWe present a--yet unpublished-- generalization of the Algebraic Dynamic Programming framework which also accomodates problems on trees. The new framework is not an "add-on" extension, but suggests a reformulation of "classical" ADP. ADP 2.0 is not only more general, but (arguably) more elegant - and more difficult to implement. In fact, no general implementation technique is known at present. An ensemble of about 30 problems - most classical, some new - from biosequence and structure analysis has been described in the new framework

    RNA Locally Optimal Secondary Structures

    Get PDF
    International audienceRNA locally optimal secondary structures provide a concise and exhaustive description of all possible secondary structures of a given RNA sequence, and hence a very good representation of the RNA folding space. In this paper, we present an efficient algorithm which computes all locally optimal secondary structures for any folding model that takes into account the stability of helical regions. This algorithm is implemented in a software called regliss that runs on a publicly accessible web server

    Bloom Filter Trie - a data structure for pan-genome storage

    Get PDF
    Holley G, Wittler R, Stoye J. Bloom Filter Trie - a data structure for pan-genome storage. In: Pop M, Touzet H, eds. Algorithms in Bioinformatics. WABI 2015. Proceedings. Lecture Notes in Computer Science . Vol 9289. Berlin, Heidelberg: Springer; 2015: 217-230

    Décoder le génome : vers la compréhension du fonctionnement du SARS-CoV-2

    Get PDF
    National audienceUne longue suite de lettres. C'est ainsi qu'un génome, comme celui-du SARS-CoV-2 est représenté. Mais comment donner du sens à cette succession cryptique de A, C, G et T ? Où se trouvent les gènes ? Quels rôles jouent-ils ? Les outils de la bioinformatique permettent de bénéficier des connaissances acquises sur d'autres coronavirus pour les transférer au SARS-CoV-2

    Comment la bioinformatique a résolu le puzzle du génome du SARS-CoV-2

    Get PDF
    National audienceConnaître le génome du SARS-CoV-2 a été une étape fondamentale dans la lutte contre l'épidémie de Covid-19. Cela a permis de rapidement identifier ses protéines, développer des tests, étudier son origine, suivre son évolution, etc. Mais comment à partir d'un simple écouvillon recouvert d'organismes variés, arrive-t-on à déterminer le génome du virus qui nous intéresse ? La bioinformatique propose des méthodes adaptées pour y arriver de manière très efficace

    Searching for alternate RNA structures in genomic sequences

    Get PDF
    International audienceWe introduce the concept of RNA multi-structures, that is a formal grammar based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multi-structures within a genomic sequence
    • …
    corecore